Thank you for your interest in the EPSM dataset! =) 

We hope it will be helpful in your research. This is an ongoin project that we will update on yearly basis. We would greatly appreciate your feedback, especially, if you spot typos or would like to share some relevant data sources to improve some coding decisions -as some observations are coded with uncertainty-- to reduce missing values or even re-think coding decisions. 

This file accompanies the data and replication code for del Río, Knutsen and Lutscher (2024) Education Policies and Systems across Modern History: A Global Dataset, which is accepted at the journal Comparative Political Studies.


Replication materials include two folders: 

1. EPSM dataset folder includes: 
  * The EPSM dataset_full which is the dataset with notes containing coding decisions, background information and uncertainty measures
  * The EPSM dataset_clean withour notes containing coding decisions, background information or uncertainty measures
  * The EPSM Codebook 
  * The EPSM Rule of thumb document
  * The datasources dataset lists the data sources employed (some links might not work or were removed as they do not work e.g., IBEI-UNESCO reports). Note these data sources represent the sources that help us make a decision. Discarted sources or those that does not provide enough information are not added in the dataset (e.g., some international yearbook of education to code some countries

**** Unfortunately, copyright issues do not allow us to share a folder containing all the data sources by country. Please contact the authors if you cannot find them from the datasources dataset or want to discuss coding decisions. *****

2. The replication folder includes: 

  2.1. The input folder has the EPSM dataset, the dataset of data sources and V-Dem's data version 12. These datasets are employed to conduct all the analyses in the article and appendix. 

  2.2. The "descriptive" folder contains the Descriptives.R, which is an R file and includes the code for the descriptive analysis: Figure 1, analyses in the section "describing historical trends in education systems,"  Figure 10, Appendix C, Appendix D, Appendix E.

  2.3. The "data_validation" folder includes the validation.R -the R file to replicate the analysis comparing EPSM data with Paglayan's data (2017) -using the timing.xlsx dataset, EPSM data and Neundorf et al.'s (2023) V-Indoc dataset -including the additional analyses in Appendix B (Figure SM5-8). 

  2.4. The "ansell_lindvall" folder includes the validation.R script with the codes use to compare Ansell & Lindaval's data (ansell_lindvall_data) and the EPSM dataset (Figure 4 and Figure SM3 and 4 in Appendix B)

  2.5. The "paglayan" folder replicates Paglayan's (2021) analyses globally (Figure 9) in the main article and Tables and Figures in Appendix G.

  2.6. The "aleman_and_Kim" folder includes the replication and extension of Aleman and Kim (2015) in Table 3 of the main article. The R script is aleman_Kim.R and the data used is aleman_kim_2015_dataset.dta in the "aleman_kim_democracy_education" folder

  2.7. The "inmmigration" folder extendes Cavaille and Marshall's (2019) analyses by using the World Value Survey (Waves 4-7, in the input folder) and EPSM data. replication_script2.R contains the code to replciate the analyses in the Education and political attitudes section (Figure 11 and plots in Appendix F)  

  2.8. The "inequality_and_education" folder replicates and extends Samuels and Vargas's (2023) article. samuels.R is the R script with the codes to produce Table 4 in the main analysis. The folder also contains Samuels and Vargas's replication folder with the data and scripts to conduct our article's analyses

  2.9. The "source_validation" folder contains the replication for the analyses of the data sources employed (Appendix A2) in the "data sources folder." Pleae run the sources validation.R script to replicate the analysis. The "source validation" folder also contains the "uncertainty" folder which replicates the Appendix A3 analyses through the code in the uncertainty.R script. 

  2.10. The "missing values" folder includes the na_analysis.R,which is the R script to analyze the distribution of missing values in Appendix A4.

